Statistical Morph Analyzer (SMA++) for Indian Languages

نویسندگان

  • Saikrishna Srirampur
  • Ravi Chandibhamar
  • Radhika Mamidi
چکیده

Statistical morph analyzers have proved to be highly accurate while being comparatively easier to maintain than rule based approaches. Our morph analyzer (SMA++) is an improvement over the statistical morph analyzer (SMA) described in Malladi and Mannem (2013). SMA++ predicts the gender, number, person, case (GNPC) and the lemma (L) of a given token. We modified the SMA in Malladi and Mannem (2013), by adding some rich machine learning features. The feature set was chosen specifically to suit the characteristics of Indian Languages. In this paper we apply SMA++ to four Indian languages viz. Hindi, Urdu, Telugu and Tamil. Hindi and Urdu belong to the Indic1 language family. Telugu and Tamil belong to the Dravidian2 language family. We compare SMA++ with some state-of-art statistical morph analyzers viz. Morfette in Chrupała et al. (2008) and SMA in Malladi and Mannem (2013). In all four languages, our system performs better than the above mentioned state-of-art SMAs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context Based Statistical Morphological Analyzer and its Effect on Hindi Dependency Parsing

This paper revisits the work of (Malladi and Mannem, 2013) which focused on building a Statistical Morphological Analyzer (SMA) for Hindi and compares the performance of SMA with other existing statistical analyzer, Morfette. We shall evaluate SMA in various experiment scenarios and look at how it performs for unseen words. The later part of the paper presents the effect of the predicted morph ...

متن کامل

Morphological Analyzer for Gujarati using Paradigm based approach with Knowledge based and Statistical Methods

Morphological Analyzer is a tool which performs syntactic analysis of a word and finds root form of input inflected word form. Morph analyzer serves as a pre-processing tool for many NLP applications. Significant amount of work has been done in this area for many Indian languages but not much work has been reported for Gujarati language. We present Morph analyzer for Gujarati language. The Morp...

متن کامل

A Java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models

In recent years statistical word alignment models have been widely used for various Natural Language Processing (NLP) problems. In this paper we describe a platform independent and object oriented implementation (in Java) of a word alignment algorithm. This algorithm is based on the first three IBM models. This is an ongoing work in which we are trying to explore the possible enhancements to th...

متن کامل

Unsupervised Improvement of Morphological Analyzer for Inflectionally Rich Languages

This paper presents an algorithm for unsupervised learning of morphological analysis and generation of in ectionally rich languages like Hindi, given a low coverage morph and a corpus of raw text. It assumes no particular theoretical model of morph, but can work with any morph that de nes classes of stem that behave similarly. The morph learning algorithm uses the concept of 'observable paradig...

متن کامل

Unsupervised morph segmentation and statistical language models for vocabulary expansion

This work explores the use of unsupervised morph segmentation along with statistical language models for the task of vocabulary expansion. Unsupervised vocabulary expansion has large potential for improving vocabulary coverage and performance in different natural language processing tasks, especially in lessresourced settings on morphologically rich languages. We propose a combination of unsupe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014